Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 25, 2018

What changes were proposed in this pull request?

This PR aims to supercede #22821 . The main author is @cloud-fan .

According to the discussion in RC4 voting thread and SPARK-25829, Spark current has a very weird behavior when we have duplicated keys in map. The newly added map-related functions in Apache Spark 2.4.0 make it worse. For instance, map_filter may return incorrect result like SPARK-25823

Before we entire fix the map behavior, we should not expose these functions to users.

  • map_entries
  • map_filter
  • map_zip_with
  • transform_values
  • transform_keys

How was this patch tested?

Pass the Jenkins.

"lower",
"lpad",
"ltrim",
"map_entries",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @felixcheung ?

sc = SparkContext._active_spark_context
return Column(sc._jvm.functions.map_entries(_to_java_column(col)))


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @HyukjinKwon and @BryanCutler ?

expression[ArrayAggregate]("aggregate"),
expression[TransformValues]("transform_values"),
expression[TransformKeys]("transform_keys"),
expression[MapZipWith]("map_zip_with"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @gatorsmile and @cloud-fan ?

public final class UnsafeRow extends InternalRow implements Externalizable, KryoSerializable {

public static final int WORD_SIZE = 8;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added by map_entries

@dongjoon-hyun
Copy link
Member Author

To run the Jenkins faster, I create a standalone PR instead of making a PR to #22821 .

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-25832][SQL] remove newly added map related functions [SPARK-25832][SQL] Remove newly added map related functions Oct 25, 2018
@dongjoon-hyun
Copy link
Member Author

Oh, I'm closing this in favor of #22827.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-25832 branch October 25, 2018 18:39
@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #98030 has finished for PR 22828 at commit 71d3b3c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants